Background: Metabolic pathway prediction is one possible approach to address the problem in system biology of reconstructing an organism’s metabolic network from its genome sequence. Recently there have been developments in machine learning-based pathway prediction methods that conclude that machine learningbased approaches are similar in performance to the most used method, PathoLogic which is a rule-based method. One issue is that previous studies evaluated PathoLogic without taxonomic pruning which decreases its performance. Results: In this study, we update the evaluation results from previous studies to demonstrate that PathoLogic with taxonomic pruning outperforms previous machine learning-based approaches and that further improvements in performance need to be made for them to be competitive. Furthermore, we introduce mlXGPR, a XGBoostbased metabolic pathway prediction method based on the multi-label classification pathway prediction framework introduced from mlLGPR. We also improve on this multi-label framework by utilizing correlations between labels using classifier chains. We propose a ranking method that determines the order of the chain so that lower performing classifiers are placed later in the chain to utilize the correlations between labels more. We evaluate mlXGPR with and without classifier chains on singleorganism and multi-organism benchmarks. Our results indicate that mlXGPR outperform other previous pathway prediction methods including PathoLogic with taxonomic pruning in terms of hamming loss, precision and F1 score on single organism benchmarks. Conclusions: The results from our study indicate that the performance of machine learning-based pathway prediction methods can be substantially improved and can even outperform PathoLogic with taxonomic pruning.
Loading....